Memory-Efficient Topic Modeling

نویسندگان

  • Jia Zeng
  • Zhi-Qiang Liu
  • Xiao-Qin Cao
چکیده

As one of the simplest probabilistic topic modeling techniques, latent Dirichlet allocation (LDA) has found many important applications in text mining, computer vision and computational biology. Recent training algorithms for LDA can be interpreted within a unified message passing framework. However, message passing requires storing previous messages with a large amount of memory space, increasing linearly with the number of documents or the number of topics. Therefore, the high memory usage is often a major problem for topic modeling of massive corpora containing a large number of topics. To reduce the space complexity, we propose a novel algorithm without storing previous messages for training LDA: tiny belief propagation (TBP). The basic idea of TBP relates the message passing algorithms with the non-negative matrix factorization (NMF) algorithms, which absorb the message updating into the message passing process, and thus avoid storing previous messages. Experimental results on four large data sets confirm that TBP performs comparably well or even better than current state-of-the-art training algorithms for LDA but with a much less memory consumption. TBP can do topic modeling when massive corpora cannot fit in the computer memory, for example, extracting thematic topics from 7GB PUBMED corpora on a common desktop computer with 2GB memory.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards Big Topic Modeling

To solve the big topic modeling problem, we need to reduce both time and space complexities of batch latent Dirichlet allocation (LDA) algorithms. Although parallel LDA algorithms on the multi-processor architecture have low time and space complexities, their communication costs among processors often scale linearly with the vocabulary size and the number of topics, leading to a serious scalabi...

متن کامل

Efficient Modeling and Simulation of a Virtually Shared Memory Architecture

Modeling and simulation have been assigned crucial roles in the design, development, analysis and evaluation of computer architectures. The design of parallel architectures in particular is a complex and difficult endeavor that makes modeling and simulation essential tools. In this case, high simulation performance is a prerequisite since, large workloads need to be simulated for a realistic an...

متن کامل

Efficient Execution of Process Networks

Kahn process networks (KPNs) [1] are a popular modeling technique for mediaand signal-processing applications. A KPN makes parallelism and communication in an application explicit; thus, KPNs are a modeling paradigm that is very suitable for multi-processor architectures. We present techniques for the efficient execution of KPNs, taking into account both execution time and memory usage.

متن کامل

Modeling A New Architecture Of Area Delay Efficient 2-D Fir Filter Using VHDL

This paper presented memory footprint and combinational complexity for two dimensional finite impulse response (FIR) filter to get the systematic design strategy to obtain areadelay-power-efficient architectures. Based on the memory sharing and memory-reuse along with suitable scheduling of computational design of storage architecture the separable and nonseparable filters with less memory foot...

متن کامل

Modeling storage and retrieval of memories in the brain

We have proposed a neural network model that stores the incoming information after orthogonalizing it in the same manner as vectors are orthogonalized. The scheme enables the brain to compare a new informational system with those in the memory and store its similarities and differences with the old memories in an economical manner. This allows the brain to have an enormous capacity and yet the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1206.1147  شماره 

صفحات  -

تاریخ انتشار 2012